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Abstract 

Background: Although molecular tools are increasingly employed to decipher invertebrate systematics, earthworm 
(Annelida: Clitellata: 'Oligochaeta') taxonomy is still largely based on conventional dissection, resulting in data 
that are mostly unsuitable for dissemination through online databases. In order to evaluate if micro-computed 
tomography (|jCT) in combination with soft tissue staining techniques could be used to expand the existing set of 
tools available for studying internal and external structures of earthworms, |iCr scans of freshly fixed and museum 
specimens were gathered. 

Findings: Scout images revealed full penetration of tissues by the staining agent. The attained isotropic voxel 
resolutions permit identification of internal and external structures conventionally used in earthworm taxonomy. 
The |jCT projection and reconstruction images have been deposited in the online data repository GigaDB and are 
publicly available for download. 

Conclusions: The dataset presented here shows that earthworms constitute suitable candidates for [uCT scanning 
in combination with soft tissue staining. Not only are the data comparable to results derived from traditional 
dissection techniques, but due to their digital nature the data also permit computer-based interactive exploration 
of earthworm morphology and anatomy. The approach pursued here can be applied to freshly fixed as well as 
museum specimens, which is of particular importance when considering the use of rare or valuable material. Finally, 
a number of aspects related to the deposition of digital morphological data are briefly discussed. 
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Data description 

Purpose of data acquisition 

The present dataset constitutes the first attempt at com- 
parative micro-computed tomography (|iCT) scanning 
of earthworm (Annelida: Clitellata: 'Oligochaeta') speci- 
mens. When used in combination with staining tech- 
niques that permit enhancing soft tissue contrast [1], |iCT 
could become a promising technique for resolving per- 
vasive issues in earthworm taxonomy and systematics. 
To this end, the application of \iCT to freshly fixed 
and museum specimens was evaluated, and results were 
compared with data derived from traditional dissection 
techniques. The main methodological and taxonomical 
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results of the study are presented in an accompanying 
publication [2]. 

The aim of the present report is to provide the earth- 
worm research community with a reference dataset for 
future analyses of soft-bodied organisms based on non- 
destructive imaging techniques. In addition, uninhibited 
data access and enforced data deposition, as practiced 
here, are briefly discussed. 

Scanned specimens 

Scans of four lumbricid ('Oligochaeta': Lumbricidae) 
earthworm specimens are part of the present dataset. 
One freshly fixed and one museum specimen (stored in 
ethanol for several decades) were scanned for each of 
the two different species employed in the study, i.e. 
Aporrectodea caliginosa (Savigny, 1826) and Aporrecto- 
dea trapezoides (Duges, 1828). All four specimens were 
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Stained using an ethanol-based phosphotungstic acid 
(PTA) solution, which was adapted from protocols de- 
scribed previously [3]. In order to increase the isotropic 
voxel resolution of the three-dimensional (3D) image 
stack, only the first ca. 35 segments of each specimen 
were scanned. These segments harbor all internal and 
external structures commonly used in earthworm tax- 
onomy. Specific specimen data and supplementary 
image files have been deposited in the publicly acces- 
sible database of the Museum of Comparative Zoology, 
MCZbase (http://mczbase.mcz.harvard.edu/). In addition, 
hyperlinks to each specimen entry in MCZbase are pro- 
vided on the dataset website in the GigaScience Database 
(GigaDB) online repository [4]. 

Data acquisition and processing 

The four scans were produced using a i^CT system 
equipped with a cone-beam tungsten X-ray source (Sky- 
Scan 1173, Bruker microCT, Kontich, Belgium). The 
specific scanning parameters are provided in the accom- 
panying pubUcation [2], and can also be found in the log 
file (.log) of each dataset folder available for download 
at GigaDB [4]. 

Each scan resulted in a set of 960 projection images 
in tagged image file format (TIFF, .tif). No binning 
protocols were employed during data acquisition. The 



projection images covered 2240 x 2240 pixels at 16-bit dy- 
namic range. Reconstruction of the two-dimensional (2D) 
projection images into a 3D volumetric image stack 
was performed using the software NRecon 1.6.6.0 
(Bruker microCT, Kontich, Belgium). This program runs 
under the reconstruction engine NReconServer 1.6.6, 
which employs a Feldkamp algorithm for volumetric re- 
construction [5]. The two reconstruction parameters with 
significant effect on the quality of the final data were ring 
artifact and beam hardening correction. The output for- 
mat for the 3D volumetric image stacks was bitmap image 
file (BMP, .bmp) at 8-bit dynamic range and 2240 x 
2240 pixel size. In order to reduce final file size, the vol- 
ume of interest (VOI) function, a 3D cropping tool, was 
used to remove all uninformative parts of the data follow- 
ing reconstruction. This resulted in changes to the pixel 
dimensions of each reconstructed image stack, but did not 
lead to spatial distortions in any of the three dimensions. 
Further information on the contents and size of both the 
projection and the reconstruction data folders is provided 
in Table 1. 

Data quality 

The quality of the data was ascertained through visual 
inspection of the scout projection and reconstruction 
images. Primary criteria were i) the full penetration of 



Table 1 Overview of the earthworm dataset deposited in GigaDB 


Specimen 


Isotropic voxel 
resolution 


Projection 
folder files 


Projection folder size 


Reconstruction 
folder files 


Reconstruction folder size 


Aporrectodea trapezoides 


8.17 pm 


960 X .tif 


Uncompressed: 8.98 GB 


2196 X .bmp 


Uncompressed: 1.61 GB 


MCZ IZ 24804 












Freshly fixed specimen 


1 7 X .crv 


1 X .log 
1 X .tif 


Compressed: 7.1 9 GB 


1 X .log 
1 X .bmp 


Compressed: 0.66 GB 


Aporrectodea caliginosa 


9.95 pm 


960 X .tif 


Uncompressed: 8.98 GB 


2194 X .bmp 


Uncompressed: 1.18 GB 


MCZ IZ 24805 












Freshly fixed specimen 


33 X .crv 


1 X .log 
1 X .tif 
1 X .roi 


Compressed: 8.03 GB 


1 X .log 
1 X .bmp 


Compressed: 0.37 GB 


Aporrectodea caliginosa 


13.15 pm 


960 X .tif 


Uncompressed: 8.98 GB 


2226 X .bmp 


Uncompressed: 0.70 GB 


MCZ IZ 95557 












Museum specimen 


1 4 X .crv 


1 X .log 
1 X .tif 
1 X .crv 


Compressed: 7.67 GB 


1 X .log 
1 X .bmp 


Compressed: 0.21 GB 


Aporrectodea trapezoides 


8.17 pm 


960 X .tif 


Uncompressed: 8.98 GB 


2226 X .bmp 


Uncompressed: 2.87 GB 


MCZ IZ 95901 




1 X .log 


Compressed: 8.37 GB 


1 X .log 


Compressed: 0.60 GB 


Museum specimen 




1 X tif 
9 X .crv 




1 X .bmp 
1 X .crv 





Explanation of the file types: .bmp = reconstructed images (multiple files), reference reconstruction {single file); .crv = preview file w/hen setting projection or 
reconstruction parameters; .log = log file listing scan parameters; .roi = 2D region of interest (ROI) used to create a 3D volume of interest; .tif = projection images 
(multiple files), reference projection (single file). 
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the sample by the staining agent and ii) the absence of 
artifacts. Although a total of eight scans were obtained 
in the course of the study, four of these scans were ei- 
ther trial scans or showed significant artifacts [2]. There- 
fore, only the four most representative scans have been 
deposited in GigaDB. Nonetheless, these scans represent 
the full taxonomic and morphological breadth of species 
and sample types included in the study. The imagery 
allows for an identification of numerous internal and 
external structures. No significant difference in the ap- 
proach was observed when employing freshly fixed or mu- 
seum specimens, nor between the two species analyzed. 

Potential uses 

The potential uses of the dataset presented here include 
morphometric or volumetric analyses of internal organs, 
studies of ingested sediment particles, the possibility of 
online collaborative dataset annotation, or interactive 
data exploration using digital 2D and 3D visualization 
tools. 

The methodological approach itself is suitable for high- 
throughput scanning of hundreds or even thousands of 
earthworm specimens as well as other soft-bodied or- 
ganisms [2]. This would result in large morphological 
taxon sampling, one of the prerequisites for broad taxo- 
nomic and systematic studies. Furthermore, non-invasive 
imaging techniques such as |iCT leave specimens intact 
and generate digital data suitable for online dissemination, 
an important condition for effective data mining. 

Availability and requirements 

Data availability 

The dataset is available at GigaDB and has a citable 
digital object identifier (DOI) [4]. Each of the eight folders 
has been packed using tape archiver (tar, .tar), before be- 
ing compressed using GNU zip (gzip, .gz). The folders can 
be individually downloaded using a set of tools, e.g. File 
Transfer Protocol (FTP). 

Dataset name: MicroCT scans of freshly fixed and mu- 
seum earthworm specimens 
Operating system: Platform-independent 
License: Creative Commons 0 (CCO) public domain 
dedication (https:/ / creativecommons.org/publicdomain/ 
zero/1.0/) 

Data requirements 

Following download, the reconstructed images can, for 
example, be rapidly visualized using the 'Fileilmport: 
Image Sequence' command chain in the Java-based im- 
aging software ImageJ (http://imagej.nih.gov/ij/). In ad- 
dition, numerous other 2D and 3D visualization tools 
are available for free [6]. Given the size of the recon- 
structed image folders, a computer system with about 



4 GB main random access memory (RAM) and 1 GB 
video RAM should be used. 

Discussion 

The dataset presented here permits full open access both 
to [iCT-derived raw data (here: the projection images) as 
well as derivative data (here: the reconstructed image 
stacks). The availability of [iCT raw data files has been 
deemed important, primarily due to the rapid increase 
in the performance of reconstruction algorithms, which 
in the future could lead to improved data reconstruction 
[7]. Furthermore, one reviewer as well as the editor of 
the accompanying publication [2] requested data de- 
position for purposes of data transparency, which was 
achieved here through storage and archiving of the data- 
set in GigaDB [4]. Despite these advances, a lack of coher- 
ent policy for data archiving and enforced data deposition 
in digital morphology remains [8], and metadata standards 
for data gathered using non-invasive imaging techniques 
are still not available [7]. 

Availability of supporting data 

The dataset supporting the results of this article is 
available in the GigaScience Database online reposi- 
tory [4]. 
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